Agent Engineering Roadmap

What you get

Course

Full syllabus, tracks, graduation criteria, and a module-to-practice map.

Runnable Examples

Minimal Python examples for agents, tools, MCP, memory, workflow, colonies, evals, and RAG.

Benchmarks

Lightweight behavior checks for tool use, RAG, workflow, security, and observability.

Lesson Plans

Instructor-ready 90-minute teaching plans for every module.

Study Groups

4-week, 8-week, and workshop formats for teams and learning cohorts.

Showcases

Healthcare, finance, and enterprise demos with sample outputs.

Domain Casebooks

Realistic healthcare, finance, and enterprise cases with risk boundaries and evals.

The mental model

An agent is not magic. It is context, tools, memory, workflow, evaluation, and human approval arranged around a useful task.

Run locally

Verify Examples

Run every dependency-free example and showcase with one command.

Open script

Mini RAG

Inspect retrieval, grounded answering, no-answer behavior, and RAG evaluation.

Open example

Evaluation Harness

Run regression checks that separate correctness, format, and safety behavior.

Open example

Benchmark Runner

Check tool use, RAG grounding, approval gates, injection defense, and traces.

Open benchmarks

Graph Approval Agent

Inspect explicit graph transitions, high-risk routing, and approval gates.

Open example

Observable Agent

Trace decisions, tool calls, guardrails, and replayable incident evidence.

Open example

Injection Defense

Block unsafe instructions from retrieved content before they reach the agent.

Open example

Cost-Aware Agent

Route tasks by budget, latency target, and required answer quality.

Open example

Durable Workflow

Checkpoint work, resume safely, and preserve artifacts after interruption.

Open example

Modern MCP Gateway

Practice tools, resources, prompts, authorization, and elicitation.

Open example

Memory Governance

Redact PII, merge repeated memories, decay confidence, and delete sensitive records.

Open example

Agent Permissions

Assign owners, scopes, risk tiers, access reviews, and audit logs to agents.

Open example

Advanced Eval

Run regression, safety, adversarial, and golden trace checks before release.

Open example

Incident Playbook

Use traces, containment, hotfix evals, and postmortems when an agent fails.

Open playbook

Product UX

Design approval, evidence, control, recovery, and trust patterns for agent products.

Open checklist

Operating Model

Register, review, monitor, and retire enterprise agents with clear ownership.

Open checklist

Release Kit

Use release, v1 readiness, and deployment review checklists before publishing changes.

Open release kit

Governance Templates

Start with registry, risk assessment, and deployment review templates.

Open templates

Paper Roadmap

Study agent papers from OpenAI, Google, Meta, Anthropic, Microsoft-adjacent ecosystems, Stanford, Princeton, and Tsinghua.

Open papers

Paper Notes

Read concise engineering takeaways for ReAct, Toolformer, WebGPT, RAG, Reflexion, Voyager, AgentBench, SWE-agent, and safety papers.

Open notes

DeepEval And RAGAS

Compare practical frameworks for LLM app tests, RAG metrics, safety checks, regression gates, and CI evaluation.

Open guide

Open Source Map

Explore agent frameworks, MCP projects, RAG tools, eval systems, observability, and ops infrastructure.

Open projects

Framework Matrix

Choose agent frameworks by control flow, state, tools, evaluation, observability, safety, and operations.

Open matrix

Repo Reading Guide

Use a 30-minute reading loop to extract architecture lessons from open-source agent repositories.

Open guide

Portfolio Projects

Build GitHub-ready agent projects with evals, traces, safety boundaries, and architecture writeups.

Open projects

Capstone Starter

Start the final project from a runnable colony scaffold with regression evals.

Open starter

Showcase demos

Healthcare Colony

Routes health education requests with safety boundaries and escalation rules.

Open demo

Finance Research

Compares companies as research support without personalized investment advice.

Open demo

Enterprise Support

Classifies support tickets, routes work, and identifies approval needs.

Open demo

Domain Casebooks

Study realistic safety and evaluation cases for healthcare, finance, and enterprise agents.

Open casebooks